Exploit Temporal Locality of Shared Data in SRC Enabled CMP
نویسندگان
چکیده
By run-time characteristic analysis of parallel workloads, we found that a majority of shared data accesses of parallel workload has temporal locality. Based on this characteristic, we present a sharing relation cache (SRC for short) based CMP architecture, saving recently used sharing relations to provide destination set information for following cache-to-cache miss requests. Token-SRC protocol integrates SRC into token protocol,reducing network traffic of token protocol.Simulations using SPLASH-2 benchmarks show that, a 16-core CMP system with tokenSRC achieved average 15% network traffic reduction of that with token protocol.
منابع مشابه
Towards zero latency photonic switching in shared memory networks
Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use...
متن کاملIs Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
On Chip Multiprocessors (CMP), it is common that multiple cores share certain levels of cache. The sharing increases the contention in cache and memory-to-chip bandwidth, further highlighting the importance of data locality analysis. As a rigorous and hardware-independent locality metric, reuse distance has served for a variety of locality analysis, program transformations, and performance pred...
متن کاملCASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors
As the increasing of issue width has diminishing returns with superscalar processor, thread parallelism with a single chip is becoming a reality. In the past few years, both SMT (Simultaneous MultiThreading) and CMP (Chip MultiProcessor) approaches were first investigated by academics and are now implemented by the industry. In some sense, CMP and SMT represent two extreme design points. In thi...
متن کاملProgram Transformations for Cache Locality Enhancement on Shared - memory
Program Transformations for Cache Locality Enhancement on Shared-memory Multiprocessors Naraig Manjikian Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 This dissertation proposes and evaluates compiler techniques that enhance cache locality and consequently improve the performance of parallel applications on shared-memory multiprocesso...
متن کاملExploiting Temporal and Spatial Constraints on Distributed Shared Objects
The advent of gigabit network technologies has made it possible to combine sets of uni-and multiprocessor workstations into a distributed, massively-parallel computer system. Middleware, such as distributed shared objects (DSO), attempts to improve programmability of such systems, by providing globally accessible 'object' abstractions. Early research on distributed shared object systems concern...
متن کامل